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Abstract 

In this paper we deal with the regression problem in a random 
design setting. We investigate asymptotic optimality under minimax 
point of view of various Bayesian rules based on warped wavelets and 
show that they nearly attain optimal minimax rates of convergence 
over the Besov smoothness class considered. Warped wavelets have 
been introduced recently, they offer very good computable and easy- 
to-implement properties while being well adapted to the statistical 
problem at hand. We particularly put emphasis on Bayesian rules 
leaning on small and large variance Gaussian priors and discuss their 
simulation performances comparing them with a hard thresholding 
procedure. 
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1 Introduction 



We observe independant pairs of variables pQ, Yj), for i = 1, . . . , n, under 
a random design regression model: 

Yi = f(Xi)+e u l<i<n, (1) 

where / is an unknown regression function that we aim at estimating, and 
Si are independant normal errors with E(£j) = 0, Var(ej) = cr 2 < oo. The 



1 



design points are assumed to be supported in the interval [0, 1] and have 
a density g which will be supposed to be known. Furthermore we assume 
that the design density g is bounded from below, i.e. < m < g, where m 
is a constant. Many approaches have been proposed to tackle the problem 
of regression in random design, we mention among others the work of Hall 
and Turlach [17], Kovac and Silverman [22], Antoniadis et al. |4J, Cai and 
Brown [8 J and the model selection point of view adopted by Baraud [6]. 
The present paper provides a Bayesian approach to this problem based on 
warped wavelet basis. Warped wavelets basis {ifjjkiG) j > —l,k G Z} in 
regression with random design were recently introduced by Kerkyacharian 
and Picard in |20| . The authors proposed an approach which would depart 
as little as possible from standard wavelet thresholding procedures which 
enjoy optimality and adaptivity properties. These procedures have been 
largely investigated in the case of equispaced samples (see a series of pio- 
neered articles by Donoho et al. [H], [15], [13]). Kerkyacharian and Picard 
actually pointed out that expanding the unknown regression function / 
in the warped basis instead of the standard wavelets basis could be very 
interesting. Of course, this basis has no longer the orthonormality prop- 
erty nonetheless it behaves under some conditions as standard wavelets. 
Kerkyacharian and Picard investigated the properties of this new basis and 
showed that not only is it well adapted to the statistical problem at hand 
by avoiding unnecessary calculations but it also offers very good theoretical 
features while being easily implemented. More recently Brutti [7] high- 
lighted their easy-to-implement computational properties. 
The novelty of our contribution lies in the combination of Bayesian tech- 
niques and warped wavelets to treat regression in random design. We ac- 
tually want to investigate whether this yields optimal theoretical results 
and promising pratical performances, which will prove to be the case. We 
do not deal with the case of an unknown design density g which requires 
further machinery and will be the object of another paper. 
Bayesian techniques for shrinking wavelet coefficients have become very 
popular in the last few years. The majority of them were devoted to fixed 
design regression scheme. Let us cite among others, papers of Abramovich 
et al. [TJ, 0, Clyde et al. [TO], [II], [12], 0, Chipman et al. 0, Rivoirard 
[25], Pensky [24J in the case of i.i.d errors not necessarily Gaussian. 
Most of those works are taking as distribution prior a mixture of Gaussian 
distributions. In particular, Abramovich et al. in [I] and [2] have explored 
optimality properties of Gaussian prior mixed with a point mass at zero 
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and which may be viewed as an extreme case of a Gaussian mixture: 

P jk ~< l r j N(0,<rf) + (1-^)5(0) 

where (3jk are the wavelet coefficients of the unknown regression function, 
= Ci2~i a and iij = min(l, C22 _jf/3 ) are the hyperparameters. This partic- 
ular form was devised to capture the sparsity of the expansion of the signal 
in the wavelets basis. 

Our approach will consist in a first time in using the same prior but in 
the context of warped wavelets. In Theorem 1 we show that the Bayesian 
estimator built using warped wavelets with this prior and this form of hy- 
perparameters achieves the optimal minimax rate within logarithmic term 
on the considered Besov functional space. Unfortunately, the Bayesian es- 
timator turns out not to be adaptive. Indeed, the hyperparameters depend 
on the Besov smoothness class index. In order to compensate this draw- 
back, Autin et al. in [5] suggested to consider Bayesian procedures based 
on Gaussian prior with large variance. Following this suggestion, we will 
consider priors still specified in terms of a normal density mixed with a 
point mass at zero but with large variance Gaussian densities. In Theorem 
2 we prove again that the Bayesian estimator built with this latter form of 
prior, still combined with warped wavelets achieves nearly optimal minimax 
rate of convergence while being adaptive. Eventually, our simulations re- 
sults highlight the very good performances and behaviour of these Bayesian 
procedures whatever the regularity of the test functions, the noise level and 
the design density which can be far from the uniform case may be. 
This paper is organized as follows. In section 2 some necessary methodol- 
ogy is given: we start with a short review of wavelets and warped wavelets, 
explain the prior model and discuss the two hyperparameters form we con- 
sider. We give in section 3 some definitions of functional spaces we consider. 
In section 4, we investigate the performances of our Bayesian estimators in 
terms of minimax rates in two cases: the first one when the Gaussian prior 
has small variance, the second case focuses on Gaussian prior with large 
variance. Section 5 is devoted to simulation results and discussion. Finally, 
all proofs of main results are given in the Appendix. 
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2 Methodology 

2.1 Warped bases 

Wavelet series are generated by dilations and translations of a function 
ip called the mother wavelet. Let <fi denote the orthogonal father wavelet 
function. The function (f) and ijj are compactly supported. Assume ip has r 
vanishing moments. Let: 

<f> jk (x) = 2 j/2 (p(2 j x -k), j,keZ 

ip jtk (x) = 2 j/2 4>(2 j x - k), j, keZ. 
For a given square- integrable function / in L 2 [0, 1], let us denote 

Q,k =< f,i>j,k > ■ 

In this paper, we use decompositions of 1- periodic functions on wavelet 
basis of L,2[0, 1]. We consider periodic orthonormal wavelet bases on [0, 1] 
which allow to have the following series representation of a function / : 

2^-1 

/(*) = EE CikM*) ( 2 ) 

j>-l fc=0 

where we have denoted ip-i t k — 0o,fc the scaling function. 

We are now going to give the essential background of warped wavelets which 

were introduced in details in [20] • First of all let us define 

G(x) = I g(x)dx. (3) 
Jo 

G is assumed to be a known function, continuous and strictly monotone 
from [0,1] to [0,1]. 

Let us expand the regression function / in the following sense: 

j>-l k=0 

or equivalently 

j>-l k=0 

4 



where 

Pjh= I f{G~ l )(x)i) jk (x)dx= I f{x)ip jk {G{x))g(x)dx. 



Hence one immediately notices that expanding in the standard ba- 

sis is equivalent to expand / in the new warped wavelets basis {ipj k (G),j > 
— l,k G Z}. This may give a natural explanation that in the follow-on, 
regularity conditions will be expressed not for / but for 

We set (3jk = (l/ n ) s ^!i=i'4 , 3k{G{Xi))Yi. (3j k is an unbiased estimate of 
Pjk since 

n 

E(/3 ifc ) = (l/n)^E(^- fc (G(X i ))(/(X i ) + e i )) = E(^. fe (G'(X))/(X) 
f(x)ip jk (G(x))g(x)dx = J f(G' 1 ){x)i) jk {x)dx = P jk . 

2.2 Priors and estimators 

We set in the following 

2 n 



^ k =-2^UG(m (4) 



i=l 



As in Abramovich et al. (see pQ, [2]), we use the following prior on the 
wavelet coefficients f3j k of the unknown function / with respect to the warped 
basis {ipjk(G),j > —l,k e Z}: 

/3 ife ~ 7^(0,7-1) + (1-^)5(0). 

Considering the Li loss, from this form of prior we derive the following 
Bayesian rule which is the posterior median: 

j3 jk = Med(P jk \p jk ) = sign(/3 jfc ) max(0, ( jk ) (5) 

where 

t r i \r I r j7jfc + min(r? ifc ,l) \ 



where $ is the normal cumulative distributive function and 



m= ~- — ~ exp { - ^m^s 1 (7) 



We set 



Wj{n) :- 



1-TTj 



We introduce now the estimator of the unknown regression / 

2^-1 

where J is a parameter which will be precised later. 

Note that in our case, the estimator resembles the usual ones in [5], [TJ 
and [2], except that the deterministic noise variance has been replaced by 
a stochastic noise level 7| fc . Its expression is given by (j3J). This change will 
have a marked impact both on the proofs of theorems by using now large 
deviations inequalities and on simulations results. 

Futhermore, such rule is of thresholding type. Indeed, as underlined in 
[TJ and [2], /3jk is null whenever (3jk falls below a certain threshold Xb- Some 
properties of the threshold Xb that will be used in the sequel are given in 
lemma 1 in Appendix. 



2.2.1 Gaussian priors with small variance 

In this paper, two cases of hyperparameters will be considered. The first one 
involves Gaussian priors with small variances. We will state as suggested 
in Abramovich et al (see [TJ, [2]) : 

T? = ci2- ja 7ij = min(l,c 2 2-^), (10) 

where a and (3 are non-negative constants, ci,C2 > 0. 

This choice of hyperparameters is exhaustively discussed in Abramovich 

et al. |2j. The authors stressed that this form of hyperparameters was 

actually designed in order to capture the sparsity of wavelet expansion. 

They pointed out the connection between Besov spaces parameters and 

this particular form of hyperparameters. They investigate various practical 

choices. 

For this case of hyperparameters ffTOl . the estimator of / will be denoted 
/• 
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2.2.2 Gaussian priors with large variance 



The second form of hyperparameters considered in the paper involves Gaus- 
sian priors with large variance as suggested in Autin et al. [5]. 
As a matter of fact, we suppose that the hyperparameters do not depend 

on j and we set : 

t] := r{nf = l/y/nlog(n). (11) 

Besides, Wj(n) := w(n). We suppose that there exist qi and q 2 such that 
for n large enough 

n- qi/2 < w{n) < n~ q2/2 . (12) 

This form of hyperparameters was emphasized in [5] in order to mimic heavy 

tailed priors such as Laplace or Cauchy distributions. Indeed, Johnstone 

and Silverman in [18], [19] showed that their empirical Bayes approach for 

regular regression setting with a prior mixing a heavy-tailed density and a 

point mass at zero proved fruitful both in theory and practice. Pensky in 

[21] also underlined the efficiency of this kind of hyperparameters. 

We underscore that contrary to the first form of hyperparameters (II 01) . this 

latter forms (fTTj) and (fT2|) lead to an adaptive Bayesian estimator. 

For this case of hyperparameters (TTTT) and (|T2l) . the estimator of / will be 

denoted /. 



3 Functional spaces 

In this paper, functional classes of interest are Besov bodies and weak Besov 
bodies. Let us define them. Using the decomposition (T5]), we characterize 
Besov spaces by using the following norm 

SM 1 sup,>_ x 2^+V2-i/p)|| (/3 . >fe)fe | k if q = 

If max(0, 1/p — 1/2) < s < r and p, q > 1 

/ e B s Piq <^ < oo. 

The Besov spaces have the following simple relationship 

B£ qi C Bp q , for si > s or si = s and qi < q 
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and 

B s p q C B°l q , for px > p and Sl > s - 1/p + l/p^ 

The index s indicates the smoothness of the function. The Besov spaces 
capture a variety of smoothness features in a function including spatially 
inhomogeneous behavior when p < 2. 

We recall and stress that in this paper as mentioned above, the regularity 
conditions will be expressed for the function due to the warped basis 

context. 

More precisely we shall focus on the space B^^. We have in particular 

We define the Besov ball of some radius R as B^^R) = {/ : ||/|| S 2oo < R}- 
Let us define now the weak Besov space W(r,2) 

Definition 1. Let < r < 2. We say that a function f belongs to the weak 
Besov body W(r, 2) if and only if: 

\\f\\ Wr := [sup A- 2 X < A|}] 1/2 < oo. (14) 

A>0 ^iT 

And we have the following proposition 
Proposition 1. Let < r < 2 and f G W(r, 2). TTien 

supA^^/{|ft,|>A}<^|. (15) 

A>0 j>-l k 

For the proof of this proposition see for instance • 
To conclude this section, we have the following embedding 

5 2,oo C W / 2,2/(l+2s) 

which is not difficult to prove (see for instance [2"Tj). 

4 Minimax performances of the procedures 

4.1 Bayesian estimators based on Gaussian priors with 
small variances 

Theorem 1. Assume that we observe model (TJ^. We consider the hyper- 
parameters defined by fflfy . Set J := J a such that 2 Ja = (3/(2n))~ 1 / a . 
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Let a > 1 and a > s, then we have the following upper bound: 

sup E|| / - f\\l = ©((l/n^^log 2 ^)) + 0((l/n) 2s / a ). (16) 



The optimal choice of the hyperparameter a in Theorem [T] should min- 
imize the upper bound derived in (fTBj) . Consequently, let us choose now in 
(TIB"]) a = 2s + 1, we immediately deduce the following corollary. 

Corollary 1. // one chooses for the prior parameter a — 2s + 1, one gets 
sup - /HI = C(log 2 (n)n- 2s /^ +1 )). 

This corollary shows that with this specific choice of hyperparameter 
a, one recovers the minimax rate of convergence up to a logarithmic factor 
that one achieves in a uniform design. 

4.2 Bayesian estimators based on Gaussian priors with 
large variance 

Theorem 2. We consider the model (QP. We assume that the hyperparam- 
eters are defined by 01]) and (TT^j. Set J := J n such that 2 Jn = n/\ogn, 
then we have : 



sup E||/-/||2<C7 

/(G-i)GB| (R) 



log(n) 



n 



2s/(2s+l) 



It is worthwhile to make some comments about the results of Theorem 
2. Here, the estimator turns out to be adaptive and contrary to the similar 
results in Proposition 2 in |20j we no longer have the limitation on the 
regularity index s > 1/2. Moreover, Kerkyacharian and Picard [2U] had to 
stop the highest level J such that 2 J = [n/ log(n)) 1 / 2 , here we stop at the 
usual level J n such that 2 Jn = nj log(n) one gets in standard thresholding . 

5 Simulations and discussion 



A simulation study is conducted in order to compare the numerical perfor- 
mances of the two Bayesian estimators based on warped wavelets and on 
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Gaussian prior with small or large variance, described respectively in sec- 
tion 2.2.1 and 2.2.2 and the hard thresholding procedure using the universal 
threshold log(ra) based on warped basis introduced by Kerkyacharian 
and Picard [20] for the nonparametric regression model in a random de- 
sign setting. For more details on Kerkyacharian and Picard procedure, the 
readers are referred to Wilier [26], see also [16]. In fact, we have decided 
to concentrate on the procedure of Kerkyacharian and Picard because it is 
interesting to point out differences and compare performances obtained by 
Bayesian procedures which apply local thresholds and a universal threshold 
procedure. 

The main difficulties lie in implementing the Bayesian procedures with the 
stochastic variance PJ. Note also the responses proposed by Amato et al. 
[3] and Kovac and Silverman [22] . 

All the simulations done in the present paper have been conducted with 
MATLAB and the Wavelet toolbox of MATLAB. 

We consider here four test functions of Donoho and Johnstone [13] repre- 
senting different level of spatial variability. The test functions are plotted 
in Fig. 1. For each of the four objects under study, we compare the three 
estimators at two noise levels, one with signal-to-noise ratio RSNR = 4 
and another with RSNR = 7. As in Wilier [26] we also consider different 
cases of design density which are plotted in Fig. 2. The first two densities 
are uniform or slightly varying whereas the last two ones aim at depicting 
the case where a hole occurs in the density design. The sample size is equal 
ton= 1024 and the wavelet we used is the Symmlet8. 
In order to compare the behaviors of the estimators, the RMSE criterion 
was retained. More precisely, if /(A«) is the estimated function value at Xj 
and n is the sample size, then 



-£(/M-/M) a - (17) 

i=l 

The RMSE displayed in Tab. 1 are computed as the average over 100 runs 
of expression (fTTj) . In each run, we hold all factors constant, except the 
design points (random design) and the noise process that were regenerated. 
El corresponds to the Bayesian estimator based on Gaussian prior with 
large variance, E2 to the Bayesian estimator based on Gaussian prior with 
small variance and E3 to the estimator built following the Kerkyacharian 
and Picard procedure in [2D] . 



RMSE 



\ 
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In order to implement El, we made the following choices of hyperparam- 
eters described in section 2.2.2 : in f[T2"j) . q 1 = q 2 = q = 1 proved to be a 
good compromise whatever the function of interest to be estimated while 
leading to good graphics reconstructions. We set w(n) = 20 x n~ q / 2 and 
t{ji) = 20 x a 2 1 (nlog(n)). To implement E2, we set c\ — 1, a?, = 2, a = 0.5 
and (3 = 1, following the choices recommended in [2] . 

The following plots compare the visual quality of the reconstructions (see 
Fig. 3. to Fig. 8). The solid line is the estimator and the dotted line is the 
true function. 







RSNR=4 


RSNR=7 




design density 


El 


E2 


E3 


El 


E2 


E3 


Blocks 


Sine 


0.0194 


0.0219 


0.0227 


0.0113 


0.0161 


0.0129 




Hole2 


0.0196 


0.0220 


0.0226 


0.0114 


0.0163 


0.0130 


Bumps 


Sine 


0.0243 


0.240 


0.259 


0.0156 


0.0167 


0.0172 




Hole2 


0.0241 


0.0237 


0.0253 


0.0155 


0.0167 


0.0169 


HeaviSine 


Sine 


0.0164 


0.0141 


0.0133 


0.0103 


0.0092 


0.0093 




Hole2 


0.0169 


0.0146 


0.0138 


0.0107 


0.0097 


0.0096 


Doppler 


Sine 


0.0236 


0.0231 


0.0236 


0.0157 


0.0238 


0.0248 




Hole2 


0.0244 


0.0238 


0.0248 


0.0166 


0.0172 


0.0176 



Table 1: Values of RMSE over 100 runs 
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Blocks Bumps 




"0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 



Fig. 1 Test functions 



constant Sine 




0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 



Hole! Hole2 




0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 



Fig. 2 Design density 
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Fig. 3 Blocks target and Sine density, RSNR=4 



(/I 



ii 



4 Blocks target and Hole2 design density, RSNR 
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E1 E2 E3 




0.5 1 0.5 1 0.5 1 



5 Blocks target and Hole2 design density, RSNR=7 



E1 E2 E3 




6 Bumps target and Sine design density, RSNR=4 
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E1 E2 E3 




0.1 ' 1 1 0.1 ' 1 1 0.1 ' 1 1 

0.5 1 0.5 1 0.5 1 



Fig. 7 HeaviSine target and Sine design density, RSNR=7 



E1 E2 E3 




Fig. 8 Doppler target and Hole2 design density, SNR=4 

We shall now comment and discuss the results displayed in Tab.l as 
well as the various visual reconstructions. 

The performances are always better for the Bayesian estimators except for 
the case of the HeaviSine test function. More precisely, the RMSE for Blocks 
whatever the noise level and design densities are smaller for Estimator 1, 
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moreover the RMSE are almost equal for Estimator 1 and 2 in the case 
of Bumps test function, whatever the design densities and for a noise level 
RSNR=4. This may be due to the irregularity of the Bumps, Blocks and 
Doppler test functions which are much rougher than the HeaviSine which is 
more regular. Indeed, Estimator 1 and 2 tend to detect better the corner of 
Blocks, the high peaks in Bumps, and the high frequency parts of Doppler 
as the graphics show it. We may explain this by the fact that Estimators 
1 and 2 have level-dependent thresholds whereas Estimator 3 has a hard 
universal threshold. 

As for the reconstructions, one can see that they are slighly better in the 
case of Sine density and small noise, whereas there are small deteriorations 
when a hole occurs in the design density but this change does not affect 
the visual quality in too big proportions. This fact highlights the interest 
of "warping" the wavelet basis. Warping the basis allows the estimators 
to behave still correctly when the design densities are far from the uniform 
density such as in the case of Hole2. 
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6 Appendix 



In the sequel C denotes some positive constant which may change from one 
line to another line. We also assume without loss of generality that o = 1 
in model (pQ). 
We have that 



To make proofs clearer we recall the Bernstein inequality that we will use 
in the sequel, (see in [23] Proposition 2.8 and formula (2.16)) 

Proposition 2. Let Z\, . . . , Z n be independant and square integrable ran- 
dom variables such that for some nonnegative constant b, Zi < b almost 
surely for all i < n. Let 




hence we get E(7j fc ) — 1/n, the expression of 7? fe being given by (HI). 
Let us define the following event: 



= - Vn| < 5}. 



(18) 



n 



S = Y,(Z i -E[Z i ]) 



i=l 



and v = Yli=i E(^« 2 )- Then for any positive x, we have 



F[S >x}< exp 




where h(u) — (1 + u) log(l + u) — u. 
It is easy to prove that 

h{u) < - 



2(l + u/3) 



which immediately yields 




19 



Lemma 1. Let q be some positive constant. We have 

2 

~ 1 H > < 2e~ nl " 1/a w,/3, \/j<J a (19) 
F(| 7 2 fc - l/n| > ?/n) < 2 e -^ 2 iog(-)/(c||^||t+^||^ll^) V ^ < j n ( 2 0) 



Proof of Lemma [T] 

Let us deal with the first case j < J a . To bound PQt 2 *. — l/n| > <r/n) we 
will use the Bernstein inequality and apply Proposition [3 In the present 
situation Z t = (l/n 2 )ip] k (G(Xi)). 

First of all, in order to apply the Bernstein inequality, we need the value of 
the sum 

n 

v = J2n((l/n 2 )^ k (G(X t ))) 2 ] 

i=l 

we have 

Erf ik (G(X)) = f\l k {G{x))g{x)dx= f\l k {y)dy 
Jo Jo 

< [ 2*4>\2?y - k)dy < V [ ^(y)dy < C||V||^(21) 



o 



hence 



;i/n 4 )^E^(G(X 4 ))<(C/n 3 )2^ ( ' 



moreover 



so 



^3— 1/a 
i=l 



IT?* -l/n)\>q/n) <2exp( 



q 2 n 2 



2C(1 + <?/3) n-^ 1 ^'' 

Let us now deal with the second case j < J n . To bound ¥{\^ 2 k — l/n| > q/n) 
we will follow the lines of the proof of the first case. Here again 

Z, = l/n 2 ^ 2 fc (G(X 4 )). 

According to (I2TT) . we have 

E(l/n 4 ^ k (G(X))) < CV/n A < C/(n 3 log(n)), j < J„ 
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and 



and 



^E(l/nVl(G(X))) 



< 



\/{n 2 \og{n)) 



i=l 



l/n^ k (G(X))) < U\\L^/(n 2 ) < U\\U(nlog(n)), j < J n 



consequently 



The following lemma shows that the properties of the Bayesian estimators 
/ and / can be controlled on the event Qr n . To lighten the notations for the 
proof of this lemma, we will denote Vt n for Vt 6 n and Vt^ the complementary 
of Q n . 

Lemma 2. We have 

nnm\f-f\\i] = o((\o g (n)/nr^ 

E[/(n^)||/-/||i] = ((l/n) 1 - 1 /«log(n)). 



Proof of Lemma [2J, 

We have 

< V + R 

Let us first deal with the variance term V. The estimator j3jk can be 
written as /3jk = Wjkfijk with < Wjk < 1. We have 

Yl ( w Jk0jk - fa) ~ (1 - w jk )/3 jk ^ I{Q c r 



j>Jn \ k J 



1/2 



V < CJ n E 



J<J„,k 



< 2CJ„E 



< 2CJ n E 



j<J„ k 
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because < Wjf. < 1. Then, using Cauchy Scharwz inequality we get 
V < 2C^7 n £ £ [E(&- fc - 0^ 

Using (J2QJ and gO]) we have 

V < 2CJ„2 J "e" ?2log{ri)/(2C,|l ^ l, * /n+?llv ' ll - ) + 2CJ n ||/(G' _1 )||2e" ?2log(n)/(c ' llv ' l! 4 +f|,,/ ' ll - ) . 

We recall that 2 Jn = n/log(n), accordingly by choosing q large enough we 
have 

V = o{{\og{n)/n) 2s,{2s+1) 

As for the term B 

B < C2~ 2 ' 7 " s e _ ^ 2 lo g("-)/(cli^llt+^ll^llSo) 
which completes the proof for /. 

The proof for / is similar, all inequalities hold a fortiori since, in the case 
of the estimator / we have P(fi£) < e ~ Cnl ~ 1/a (see ([191)). 

Let us place in the setting of Theorem 1. We recall that (3jk is zero 
whenever \(3jk\ falls below a threshold A# and we have the following lemma 
concerning the behavior of As 

Lemma 3. On the event VL s n defined by [Tty) with 5 = l/(2n) ; for a > 1 we 
have 



W^, j<J. (22) 

Joe — (JLW/« 



and J a is taken such that 2 a = (j-) 



Proof of Lemma [31 We follow the lines of the proof of lemma 1. in 
On the one hand we have (see proof of lemma 1. in [I] page 228) 



v< W±i) log fi^V^L + c 
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where c is some suitable positive constant. Besides, we have l/(2n) < 7| fc < 
3/ (2n), therefore 

2 2(3/(2n))((3/(2n)) + Cl (3/(2n))) / 1 - c 2 (3/(2n))^ + Cl )(3/(2n)) 
ci(3/(2n)) H c 2 (3/(2n))^ ^I/T^ 



in 



hence we get 

\ B 2 < c(l/n) log(c(l/n) ( -- } + c) 

where c denotes a positive constant depending on c\ and c 2 and which may 
be different at different places. Since 

c(l/n) log(c(f/n) ( -- } + c) « -c(/3/a)(l/n) log(l/n) 
we finally get 

X B 2 <-c((3/a)(l/n)\og(l/n). 

On the other hand, for the reverse inequality, we have (see proof of lemma 
1. in p] page 228 and formula (14) in pQ page 221) 



x B 2 > j, io g 



but \jj k — l/n| < l/(2n) consequently one has 

Ab 2 > -#)(l/n)(bg(l/n)) 
which completes the proof. 



Proof of Theorem 1. 

Let us place on the event fi* defined by (Tl~8]) with 5 = l/(2n). 

By the usual decomposition of the MISE into a variance and a bias term 

we get 

EII/-/H2 < 2[E|| J]) - ^feC^lll + II X^^^^Hi] 

< 2(V + B) 

with 

j<Ja k 
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B =llEE^*^(G)lla- 

j>J a k 

We first deal with the term V. We have 

ii E Efe - w^fcCGOiia < ^ E n Efe - 

We want to show that 

ii E(&* - < cEfe - w 2 - 

For this purpose we have 

k j k 

= j\ ~ ^ k ^ k ^ g{G\ X )) dX 

= II E^' fc ~~ ^jk)^j,k\\t 2 n 

k 

where uj(x) = 1/ (g(G~ 1 ))(x). 
Now using inequality (44) p. 1075 in [20] we have 

II Ytf* ~ Mi Aw ^ cv E i&* - &*l 2w (4*) 

fe fc 

where Jj^ denotes the interval ^p] and uj{Ijk) = f Ik ti>(x)dx. But the 

jelow by m. Hen 



design density g is bounded below by m. Hence we get 



and consequently 

ii E^' fc ~ - ^E^ _ fo^ 2 - 

k k 

We decompose now V into three terms 

v < cj a E e Efe - 4) 2 + (4 - 4) 2 + o& - M s 

j<Ja k 
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where 

P'jk = bj(3 jk I{\f3jk\ > k\ b } 
with k a positive constant and 



As a consequence we have 

^<CJ a (Ai+^ 2 + A3). 

We are now going to upperbound each term Ai, A 2 and A 3 . We start with 

Ai^E*-4) 2 ]' 

As precised in the beginning of section 2.2 p 6, f3jk = for \Pj k \ < X B . As 
well, (3'j k = for \Pjk\ < k\b and (3jk — 0j k — > monotonically as (3jk — > oo. 
Hence 

max | I = 6jA B 

which implies 

We have Ab ~ ^/^Jp an d < 1 f° r J < hence we get 

j<J a k=0 



so 



A x < C^Y. 2 ' ( 23 ) 

i<j Q 

< c !25M(iy 1/a (24) 
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finally 

A 1 = 0(log(n)(±) 1 -V a ) 
Let us now consider the second term A2 



a 2 = Y/L^-& 



j<J a k=0 

= E ( fe A J {i4-fci > kX b} - bjPjk) 2 

j<J a k=0 

We have that bj < 1, consequently it follows 

A 2 = ^ ^E((4- fc -^) 2 /{|/3 jfe | >kA b })+e]T <kA b } 

j<J a fc=0 j<J a k=0 

.1 .n 

= A 2 + A 2 
We have 

j<J a fc=0 



Using inequality (64) in J2U] P- 1086 we have 



E(/3 jfc -^ fc ) 2 <C 1 + l|/||2 °° (25) 

hence 

A' 2 = ^((l/n) 1 - 1 /"). 
We now bound the term A 2 . 

v-\ 

A' 2 ' = E P 2 jMP*\ < «A B }(/{|^*| < 2kA b } + J{|ft fc | > 2kAb}) 

j<J a k=0 

23-1 2-?-l 

< E ^ ^ < 2/cA B } + E E /WlA* - &*l > 

j<J a k=0 j<J a k=0 

= T 3 + T 4 (26) 
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We have 



T 3 <CY \\V < C^^n 1 /- = C\og{n)n- l+l ' a . 

j<Ja 



Let us focus on T 4 , we have 

n 

fak-frk = l/nJ2^jAG( x i))(f( x i) -^j,k(G(X))f(X) 



i=i 

Hence 



HlPjk - Pjk\ > Ky/\og(n)/n) < Pi + P 2 

where 

n 

Pi =P(|l/n^^,fe(G ! (X i ))(/(X i ))-E^(G(X))/(X)| > K /2y/(}og(n)/n)) 
t=i 

(27) 

and 

n 

P 2 = P(|l/n^^(G(X l ))e J | > /c/2V(log(n)/n)) (28) 

i=l 

Kerkyacharian and Picard in [20] in order to prove inequality (65) in [20] 
showed p. 1088 that 

P 1 <2exp(- f > (29) 

4||/||oo(3 + «) 

if 2 J < n/ log(n). As for P 2 , conditionally on (Xi, . . . , X n ) we have 

n 

8=1 

where 7^ has been defined in (J3j). Using exponential inequality for Gaussian 
random variable we have 

_ /t 2 log(ra) 

= Ee"" 5 ^/^ - l/n\ < l/2n) + J(| 7 | fc - l/n| > l/(2n))) 
< e -^ M +P(| 7 | fc -lH>l/(2n)). (30) 
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Using f|T9|) with q = 1/2, we have for a > 1 



f _ rn i-i/ 01 « 2 i°b(") „ , -3/clog(n) 



1MI2 



< (2e >- - ) + e ^ +2 exp( 4||/|U(3 °;; ) ))||/ (G -)|| 

It remains to fix k large enough so that we get 

T 4 = 0(log(n)n- 1+1 / Q ). 
So we have for A" 2l with a > 1, 



4 - O(^) 

Finally we get for A 2 

A a = 0(]og(n)(V~ 1/o )- 
n 

Let us consider now the term A 3 

j<J a k=0 

V-\ 2-j-l 2 

17* 



j<J a fe=0 j<J a fc=0 3 ^ ?jk 



Since [T?fe — < 1/(2ti), we get 

(3/(2?7.) \ 2 2 J — 1 

Cl 2-^ + l/(2n)J 

but /(G -1 ) belongs to the Besov ball Bi^^R) which entails 

p% < M2 ~ 23S 

k=0 
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hence 



^3 < C/n 2 2 2j( - s+Q) 

j<Ja 

We have 

i-i — 2( — s + a) n / 

^3 < C/n\l/n)^-^ = 0(l/n) 2s l a . 

We are now in position to give an upper bound for the variance term V 
namely 

V < CJ a (log(n)(l/n) 1 - 1 / a + {l/n) 2s ' a ). 

It remains to bound the bias term B. In [20] p. 1083 using inequality (44) 
the authors have proved that for any I we get 

1/2 

11 EE/vmgoii 2 < E ii < cj2^ j/2 [ E 

j>l k j>l k j>l ^ k 

(31) 

Applying fl3Tj) with in our case of lower bounded design density, w(Ij t k) < 
2~i Jm and I — J a , it follows 

llEE&^^ii 2 < c'E (Ei^i 2 ) 1/2 

j>j a 

hence 

2^-1 

5 = II E E < C2~ 2 ^ = C(l/n) 2 ^ 

j>J Q fc=0 

which completes the proof of Theorem 1. 

Lemma 4. Let Wjk a sequence of random weights lying in [0, 1]. We assume 
that there exist positive constants c, m and K such that for any e > 0, 

Pn = (WjkPjkJjk 

is a shrinkage rule verifying for any n, 

w jk (n) = 0, a.e. Vj > J n with2 Jn ~ n/log(n) := t 2 , V k (32) 
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\Pjk\ < mt n w jk < ct n , Vj < J n , Wk, (33) 

(l-w jk {n))<K(-^- + t n ) a.e. V]<J n ,Vk. (34) 
\Pjk\ 

and let 

j<J n k 

Then 

sup E\\f-f\\l<(log(n)/n) 2 ^ 2s+1 \ 



Proof of Lemma 31 

E||/-/||^ < 2C(J n ^^E(/3 jfc -^) 2 + ||^^/3| fc ^ fc (G(x))||2) 

j<J„ k j>Jn k 

< V 1 + B 1 . 
We first consider the term V\ 

V, < 2J n E - fa? + (1 - ^ 2 /^K{fel < mt n } 

j<Jn k 

+ J n E J2( w Ufa - faf + (! - ^j0 2 /?MI&*I > mt n } 

j<J„ k 

= v; + v; 

V; = J n (T 5 + T 6 ) 

j<Jn k 

but according to (I25I) we have for 2- 7 < log(n)/n 

E(A-fc-W a <c , 1 + |[ /ll °° 

hence using f l33|) it follows 

T 5 < Cl%2 Jn l/n. 
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As for T 6 

T 6 = E $^(1 - WjtfPjMPikl < m *n} 

j<Jn k 

< E 5^(1 - w jk ) 2 /3? k I{\j3 jk \ < mt n }[I{\/3 jk \ < 2mt n } + I{\(3 jk > 2mt n 

j<J„ k 



By ([15]) we get 

T 6 < 2(mt n ) 2s/(2s+1) ll/ll^ 2/{1+2s) + Ej<j n EkWlfa ~ Pik\ > mt n ). 
We are going to bound ^(\Pjk — Pjk\ > mt n ). We have 

- Pj*\ > rn^/\og{n)/n) < P 3 + P 4 

where 

n 

P 3 = P(|l/n£^(G(X,))(/M -E^ k (G(X))f(X)\ > m/2^\og{n)/n) 
i=i 

(35) 

and 

n 

P 4 = P(|l/n^^(G(X i ))^| > m/2v/(log(n)/n)). (36) 
i=i 

Kerkyacharian and Picard in [20] in order to prove inequality (65) in [20] 
showed p. 1088 that 

^ . 3m 2 logfn) . , , 

P 3 < 2 exp - *\> 37 

4||/||oo(3 + m) 

if 2 J < nj log(n). As for P 4 , conditionally on (X 1? . . . , X n ) we have 

n 

l/n^^, fc (G(X t )K t ~iV(0,7| fc ) 
i=i 

where 7 2 fc has been defined in @. 
P 4 < E(exp(-^^)) 

_ m 2 log(n) \\ 

= Ee {I{\l%-l/n\<,/n) + I{\ 1 %-l/n\>,/n)) 

m 2 log(n) 

< e 8 ( ,+i) + P(|7? fc — l/n| > q/n). (38) 
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Using (|2D} to bound P(|^ fc - l/n| ><?/n) we get 

2 l / \ //•/-ill / 1 1 4 i II i n2 \ to 2 log(n) / 3m 2 log(re) 

HlPjk - Pjk\ > mt n ) < 2e _? lo z( n y( c WU+<WU + e " SG+IT + 2e ( ~ 4 H/H~( 3 +™> 
thus 



, 2 
ipr; 



FdySj-jfe - /9,-jfcl > mt n ) < 2n CM t+^^ + n^ 1 ) + 2n 4 ll/n°°(3+ m ) (39) 
which entails by fixing m and large enough 

j<Jn k 

< 2(m^ 4 ^ +1 )||/|| 2 W2/(i+2s) + ||/(G- 1 )||^. 
Let us look at the term Vj" 

K = E J] - ^'fc) 2 + (1 - ^fc) 2 ^ft)^{lAfcl > ™U 

K = E £ - ^ + ^ - w ifc) 2 ® J {l4-fcl > mt n }[I{\P jk \ < mt n /2} + I{\/3 jk > mt n /2\) 

j<Jn k 

= T 7 + T s 

for the term Tj, we use the Cauchy Scharwz inequality 

T 7 < E( E (^ - Pik)*) 1/2 (P(\Pjk - Pik\ > mt n /2)) 1 ' 2 

j<Jn k 

+ J2 J2^kH\^k\ > mt n }I{\(3 jk \ < mt n /2}. 

j<Jn k 

Furthermore, using inequality (64) p. 1086 in [20] we get for 2 J < nj log(n) 

E0 3k -f3 3k ) i <C 1 + ^°° (40) 

It 

and by (1591 



P(|/?ifc - fak\ > mt n /2) < 2n CM i + " M ™ + n^+v + 2n 16 imi°°( 3 +™> 
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from which follows by fixing again m and q large enough 

T 7 < C/n.2 Jn {n CM t+^ 2 °° + + 2n^mA^) 1/2 + ^ ^P%I{\Pjk\ < mt n /2} 

j<Jn k 

W s/(2a+X)' 



< ^ + ((m/2K) 4s /(i+2 S ),UN2 
For the term T* 



t 8 = E^^(u>f fc (4- fc -/^ 2 + ^^ 

j<J„ k 

4 -2/(2s+l) 
< ^ ||f||2 ( f \is/{l+2s) 

(1 - 2~ 2 /( 1+2s )) llw a/(i+8»)^ 

+ E E E^ 1 " ^ 2 /3|^{fel > rnt n }I{\(3 jk > mt n /2\}[I{\p jh \ > \/3 jk /2\} + I{0 jk \ < \(3 jk /2\}}. 

j<Jn k 

Hereafter we decompose 

EEB 1 - w, k ) 2 (3%)I{0 lk \ > mt n }l{\p jk > mt n /2\}[I{0 jk \ > \(3 jk /2\} + I{0 jk \ < \(3 jk /2\}} 

j<Jn k 

= U + T' 8 

t; < E E* p (fe - &*i > </4) 

using fl39l) we get for m and large enough 
as for Tg 

Tg = E ^ ^(l-WjkfpfcUPjk] > mt n }I{\P jk \ > mt n /2}I{\/3 jk \ > \P jk \/2} 

j<Jn k 
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using (13~4"1) we get 

T s < E J] E^(J^ + *«) 2 /{lA"*l > l/3i*l/2}/{|/9i*| > ^n/2} 

i<Jn ft 

4+2 

^ 2 ^ E E$%12 + > ^n/2} 

= 8^^^/{|^ fe | >mt B /2} + 2^||/(G- 1 )||l 

j<Jn fc 

using (TTST) it follows 

t -2/(i+2*) 2 2 -2/( 1 + 2s ) 

li < ^ 2 «(=) r ^ gS? ll/lliw,+i"^ll/(g- 1 )llS 

m -2/(l+2s) 
1 _ 2-2/(l+2a) 



< ': 9 ; 2/(1+2s) ^ 1+2 ^ + 2KHl\\f(G-i)\\l. 



It remains to bound the bias term B\. To this purpose we use the fact that 

2J-1 

= II E E / Wi.fcCGO'Mla < O^ 2 '- = Ct n < C4 s/{2s+1) 

j>J„ k=0 

which completes the proof. 



Proof of Theorem 2. 



In order to prove the Theorem 2., we have to prove that the Bayesian 
estimators (jSJ) based on Gaussian priors with large variance (TTTT) and ([121) 
satisfy the conditions of Lemma 2. 

We will not get into details of the proof because this latter is identical to 
the proof of Theorem 3. in [5], with the sole exception that here, we will 
place ourselves on the event fl^ with 5 = q/n, q some positive constant. 
Indeed, as precised above in section 2.2, a key observation is that instead 
of having a deterministic noise e = l/y/n like in [5], here we have to deal 
with a stochastic noise jj k which expression is given by 
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